Linear and Quadratic Discriminant Analysis (LDA / QDA)#
Discriminant Analysis is a classic family of generative classifiers.
LDA assumes each class is Gaussian with a shared covariance → linear decision boundaries
QDA allows each class to have its own covariance → quadratic decision boundaries
Learning goals#
understand the generative story: \(p(x\mid y)\) + Bayes → \(p(y\mid x)\)
derive the discriminant score for LDA/QDA
build intuition for when the boundary becomes linear vs quadratic
implement LDA/QDA from scratch (NumPy)
use scikit-learn’s
LinearDiscriminantAnalysisandQuadraticDiscriminantAnalysis
Table of contents#
Generative vs discriminative intuition
Gaussian class-conditional model
QDA discriminant function (quadratic)
LDA as a constrained QDA (linear)
From-scratch LDA/QDA
Visual decision boundaries (Plotly)
When does QDA overfit?
LDA as supervised dimensionality reduction
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from sklearn.datasets import make_blobs
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis, QuadraticDiscriminantAnalysis
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(7)
1) Generative vs discriminative intuition#
Two worldviews:
Discriminative#
Learn \(p(y\mid x)\) directly (or a decision boundary).
Examples:
logistic regression
SVM
neural networks
Generative#
Model how data is generated:
Then use Bayes:
Examples:
Naive Bayes
LDA / QDA
Anecdote:
Discriminative models are like a judge who learns where to draw the line. Generative models are like a novelist who learns how each class produces data, then uses that story to decide.
2) Gaussian class-conditional model#
Assume that for each class \(k\):
Then:
Add the log prior \(\log \pi_k\) and you get a discriminant score.
# A 2D dataset where classes have different covariance shapes (QDA-friendly)
n = 900
mu0 = np.array([-2.0, -1.0])
mu1 = np.array([+2.0, +1.0])
Sigma0 = np.array([[1.8, 0.8], [0.8, 1.0]])
Sigma1 = np.array([[1.0, -0.6], [-0.6, 1.6]])
X0 = rng.multivariate_normal(mu0, Sigma0, size=n // 2)
X1 = rng.multivariate_normal(mu1, Sigma1, size=n // 2)
X = np.vstack([X0, X1])
y = np.array([0] * (n // 2) + [1] * (n // 2))
X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.3, random_state=7, stratify=y)
fig = px.scatter(
x=X_tr[:, 0],
y=X_tr[:, 1],
color=y_tr.astype(str),
title="Training set (different class covariances)",
labels={"x": "x1", "y": "x2", "color": "class"},
)
fig.update_traces(marker=dict(size=6, opacity=0.65))
fig.update_layout(width=760, height=500)
fig.show()
def covariance_ellipse_points(mean: np.ndarray, cov: np.ndarray, n_points: int = 200, n_std: float = 2.0):
"""Return x,y points of an ellipse representing the covariance."""
mean = np.asarray(mean, dtype=float)
cov = np.asarray(cov, dtype=float)
vals, vecs = np.linalg.eigh(cov)
order = np.argsort(vals)[::-1]
vals = vals[order]
vecs = vecs[:, order]
# ellipse axes lengths
radii = n_std * np.sqrt(np.maximum(vals, 0.0))
t = np.linspace(0, 2 * np.pi, n_points)
circle = np.vstack([np.cos(t), np.sin(t)])
ellipse = (vecs @ (radii[:, None] * circle)).T
ellipse = ellipse + mean[None, :]
return ellipse[:, 0], ellipse[:, 1]
# Estimate means/covariances from training data
mu0_hat = X_tr[y_tr == 0].mean(axis=0)
mu1_hat = X_tr[y_tr == 1].mean(axis=0)
S0_hat = np.cov(X_tr[y_tr == 0].T, bias=False)
S1_hat = np.cov(X_tr[y_tr == 1].T, bias=False)
x0e, y0e = covariance_ellipse_points(mu0_hat, S0_hat, n_std=2.0)
x1e, y1e = covariance_ellipse_points(mu1_hat, S1_hat, n_std=2.0)
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_tr[:, 0], y=X_tr[:, 1], mode="markers",
marker=dict(color=y_tr, colorscale="Viridis", size=6, opacity=0.6),
name="train"))
fig.add_trace(go.Scatter(x=x0e, y=y0e, mode="lines", name="class 0 (2σ ellipse)", line=dict(width=3)))
fig.add_trace(go.Scatter(x=x1e, y=y1e, mode="lines", name="class 1 (2σ ellipse)", line=dict(width=3)))
fig.update_layout(title="Empirical covariance ellipses", width=760, height=520)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
fig.show()
3) QDA discriminant function (quadratic)#
For class \(k\), the log posterior (up to a shared constant) is:
Pick the class with maximum score:
Because of the quadratic form \((x-\mu_k)^\top\Sigma_k^{-1}(x-\mu_k)\), the boundary is generally quadratic.
4) LDA as a constrained QDA (linear)#
LDA makes one simplifying assumption:
When covariances are shared, the quadratic term in \(x\) cancels when comparing classes.
The discriminant becomes linear:
So the decision boundary is a hyperplane.
Intuition:
LDA is “one common ellipse shape for everyone, but different centers”
QDA is “each class gets its own ellipse shape and orientation”
class ScratchLDA:
def __init__(self, reg: float = 1e-6):
self.reg = float(reg)
def fit(self, X: np.ndarray, y: np.ndarray):
X = np.asarray(X, dtype=float)
y = np.asarray(y)
self.classes_, y_enc = np.unique(y, return_inverse=True)
K = self.classes_.shape[0]
n, d = X.shape
self.priors_ = np.bincount(y_enc, minlength=K).astype(float)
self.priors_ = self.priors_ / self.priors_.sum()
self.means_ = np.zeros((K, d), dtype=float)
for k in range(K):
self.means_[k] = X[y_enc == k].mean(axis=0)
# pooled covariance (unbiased)
S = np.zeros((d, d), dtype=float)
for k in range(K):
Xk = X[y_enc == k]
Xc = Xk - self.means_[k]
S += Xc.T @ Xc
S /= (n - K)
S = S + self.reg * np.eye(d)
self.cov_ = S
self.precision_ = np.linalg.inv(S)
# precompute terms for linear discriminant
self._w_ = (self.precision_ @ self.means_.T).T # (K,d)
self._b_ = -0.5 * np.sum(self.means_ * self._w_, axis=1) + np.log(self.priors_ + 1e-300)
return self
def decision_function(self, X: np.ndarray) -> np.ndarray:
X = np.asarray(X, dtype=float)
return X @ self._w_.T + self._b_[None, :]
def predict_proba(self, X: np.ndarray) -> np.ndarray:
scores = self.decision_function(X)
scores = scores - scores.max(axis=1, keepdims=True)
probs = np.exp(scores)
probs = probs / probs.sum(axis=1, keepdims=True)
return probs
def predict(self, X: np.ndarray) -> np.ndarray:
scores = self.decision_function(X)
return self.classes_[np.argmax(scores, axis=1)]
class ScratchQDA:
def __init__(self, reg: float = 1e-6):
self.reg = float(reg)
def fit(self, X: np.ndarray, y: np.ndarray):
X = np.asarray(X, dtype=float)
y = np.asarray(y)
self.classes_, y_enc = np.unique(y, return_inverse=True)
K = self.classes_.shape[0]
n, d = X.shape
self.priors_ = np.bincount(y_enc, minlength=K).astype(float)
self.priors_ = self.priors_ / self.priors_.sum()
self.means_ = np.zeros((K, d), dtype=float)
self.covs_ = np.zeros((K, d, d), dtype=float)
self.precisions_ = np.zeros((K, d, d), dtype=float)
self.logdets_ = np.zeros(K, dtype=float)
for k in range(K):
Xk = X[y_enc == k]
self.means_[k] = Xk.mean(axis=0)
Sk = np.cov(Xk.T, bias=False) + self.reg * np.eye(d)
self.covs_[k] = Sk
self.precisions_[k] = np.linalg.inv(Sk)
self.logdets_[k] = float(np.linalg.slogdet(Sk)[1])
return self
def decision_function(self, X: np.ndarray) -> np.ndarray:
X = np.asarray(X, dtype=float)
n = X.shape[0]
K = self.classes_.shape[0]
scores = np.zeros((n, K), dtype=float)
for k in range(K):
diff = X - self.means_[k]
maha = np.sum((diff @ self.precisions_[k]) * diff, axis=1)
scores[:, k] = (
np.log(self.priors_[k] + 1e-300)
- 0.5 * self.logdets_[k]
- 0.5 * maha
)
return scores
def predict_proba(self, X: np.ndarray) -> np.ndarray:
scores = self.decision_function(X)
scores = scores - scores.max(axis=1, keepdims=True)
probs = np.exp(scores)
probs = probs / probs.sum(axis=1, keepdims=True)
return probs
def predict(self, X: np.ndarray) -> np.ndarray:
scores = self.decision_function(X)
return self.classes_[np.argmax(scores, axis=1)]
# Fit scratch + sklearn models
scratch_lda = ScratchLDA(reg=1e-6).fit(X_tr, y_tr)
scratch_qda = ScratchQDA(reg=1e-6).fit(X_tr, y_tr)
sk_lda = LinearDiscriminantAnalysis().fit(X_tr, y_tr)
sk_qda = QuadraticDiscriminantAnalysis(reg_param=1e-6).fit(X_tr, y_tr)
pred_lda_s = scratch_lda.predict(X_te)
pred_qda_s = scratch_qda.predict(X_te)
pred_lda = sk_lda.predict(X_te)
pred_qda = sk_qda.predict(X_te)
print("Scratch LDA accuracy:", accuracy_score(y_te, pred_lda_s))
print("Scratch QDA accuracy:", accuracy_score(y_te, pred_qda_s))
print("sklearn LDA accuracy:", accuracy_score(y_te, pred_lda))
print("sklearn QDA accuracy:", accuracy_score(y_te, pred_qda))
print("\nClassification report (sklearn QDA):")
print(classification_report(y_te, pred_qda, digits=3))
Scratch LDA accuracy: 0.9666666666666667
Scratch QDA accuracy: 0.9814814814814815
sklearn LDA accuracy: 0.9666666666666667
sklearn QDA accuracy: 0.9814814814814815
Classification report (sklearn QDA):
precision recall f1-score support
0 0.985 0.978 0.981 135
1 0.978 0.985 0.982 135
accuracy 0.981 270
macro avg 0.982 0.981 0.981 270
weighted avg 0.982 0.981 0.981 270
def plot_boundary(model, X, y, title: str, grid_steps: int = 260):
x_min, x_max = X[:, 0].min() - 1.2, X[:, 0].max() + 1.2
y_min, y_max = X[:, 1].min() - 1.2, X[:, 1].max() + 1.2
xs = np.linspace(x_min, x_max, grid_steps)
ys = np.linspace(y_min, y_max, grid_steps)
xx, yy = np.meshgrid(xs, ys)
grid = np.c_[xx.ravel(), yy.ravel()]
proba = model.predict_proba(grid)[:, 1].reshape(xx.shape)
fig = go.Figure()
fig.add_trace(go.Contour(
x=xs,
y=ys,
z=proba,
colorscale="RdBu",
opacity=0.75,
contours=dict(showlines=False),
colorbar=dict(title="P(class=1)"),
))
fig.add_trace(go.Scatter(
x=X[:, 0],
y=X[:, 1],
mode="markers",
marker=dict(color=y, colorscale="Viridis", size=6, line=dict(width=0.5, color="white")),
name="data",
))
fig.update_layout(title=title, width=780, height=540)
fig.update_yaxes(scaleanchor="x", scaleratio=1)
return fig
fig1 = plot_boundary(scratch_lda, X_te, y_te, "Scratch LDA decision surface")
fig1.show()
fig2 = plot_boundary(scratch_qda, X_te, y_te, "Scratch QDA decision surface")
fig2.show()
fig3 = plot_boundary(sk_lda, X_te, y_te, "sklearn LDA decision surface")
fig3.show()
fig4 = plot_boundary(sk_qda, X_te, y_te, "sklearn QDA decision surface")
fig4.show()
6) When does QDA overfit?#
QDA estimates a full covariance matrix per class.
For \(d\) features, a full covariance has \(d(d+1)/2\) parameters.
So QDA can be data-hungry:
in low dimension with enough data → QDA can be great
in high dimension or small datasets → QDA can overfit
LDA shares one covariance across classes, so it has fewer parameters and is often more stable.
def sample_gaussian_dataset(n_samples: int, seed: int = 0):
r = np.random.default_rng(seed)
mu0 = np.array([-2.0, -1.0])
mu1 = np.array([+2.0, +1.0])
Sigma0 = np.array([[1.8, 0.8], [0.8, 1.0]])
Sigma1 = np.array([[1.0, -0.6], [-0.6, 1.6]])
X0 = r.multivariate_normal(mu0, Sigma0, size=n_samples // 2)
X1 = r.multivariate_normal(mu1, Sigma1, size=n_samples // 2)
X = np.vstack([X0, X1])
y = np.array([0] * (n_samples // 2) + [1] * (n_samples // 2))
return X, y
train_sizes = [40, 80, 140, 220, 350, 500, 800]
acc_lda = []
acc_qda = []
for n_train in train_sizes:
Xs, ys = sample_gaussian_dataset(n_train + 400, seed=7 + n_train)
X_tr, X_te, y_tr, y_te = train_test_split(Xs, ys, test_size=400, random_state=7, stratify=ys)
m_lda = LinearDiscriminantAnalysis().fit(X_tr, y_tr)
m_qda = QuadraticDiscriminantAnalysis(reg_param=1e-6).fit(X_tr, y_tr)
acc_lda.append(accuracy_score(y_te, m_lda.predict(X_te)))
acc_qda.append(accuracy_score(y_te, m_qda.predict(X_te)))
fig = go.Figure()
fig.add_trace(go.Scatter(x=train_sizes, y=acc_lda, mode="lines+markers", name="LDA"))
fig.add_trace(go.Scatter(x=train_sizes, y=acc_qda, mode="lines+markers", name="QDA"))
fig.update_layout(title="Test accuracy vs training size", xaxis_title="# training samples", yaxis_title="accuracy", width=780, height=460)
fig.show()
7) LDA as supervised dimensionality reduction#
LDA is also used as a supervised projection technique.
It finds directions that separate classes by maximizing the ratio:
In scikit-learn, LinearDiscriminantAnalysis(n_components=...) can transform data into a lower-dimensional space.
# 3-class dataset → LDA projection to 2D
X3, y3 = make_blobs(
n_samples=1200,
centers=[(-2, 0, 0), (2, 0, 0), (0, 2, 2)],
cluster_std=[1.4, 1.4, 1.4],
random_state=7,
)
X3_tr, X3_te, y3_tr, y3_te = train_test_split(X3, y3, test_size=0.3, random_state=7, stratify=y3)
lda_proj = LinearDiscriminantAnalysis(n_components=2)
lda_proj.fit(X3_tr, y3_tr)
Z = lda_proj.transform(X3_te)
fig = px.scatter(x=Z[:, 0], y=Z[:, 1], color=y3_te.astype(str), title="LDA projection (2 components)", labels={"x": "LD1", "y": "LD2", "color": "class"})
fig.update_traces(marker=dict(size=6, opacity=0.7))
fig.update_layout(width=760, height=500)
fig.show()
print("LDA projection explained variance ratio (approx):", getattr(lda_proj, 'explained_variance_ratio_', None))
LDA projection explained variance ratio (approx): [0.5896 0.4104]
Summary#
LDA/QDA are generative classifiers built from Gaussian class-conditional models.
QDA uses class-specific covariances → more flexible, but more parameters.
LDA shares covariance → linear boundaries and better stability with limited data.
LDA can also be used for supervised dimensionality reduction.
Exercises#
Generate a dataset where covariances are truly equal and compare LDA vs QDA.
Increase dimensionality (e.g. \(d=20\)) and watch QDA become unstable unless you regularize.
Compare Gaussian Naive Bayes (diagonal covariance) vs LDA (full covariance).